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Foreword 


As  the  Navy’s  leading  laboratory  for  research  and  development 
in  mapping,  charting,  and  geodesy  (MC&G),  the  Naval  Oceano¬ 
graphic  and  Atmospheric  Research  Laboratory  is  actively  involved 
in  applying  digital  MC&G  data  to  the  support  of  naval  weapons 
systems  and  in  conducting  research  to  improve  these  data. 

This  report  provides  details  of  significant  research  on  digital 
image  segmentation,  a  valuable  technique  for  improving  underwater 
mine  and  submarine  detection,  for  better  target  recognition,  and 
for  improving  the  quality  of  automated  computer  vision  output 
used  in  autonomous  digital  mapping. 


W.  B.  Moseley 
Technical  Director 


L.  R.  Elliott,  Commander,  USN 
Commanding  Officer 


Executive  Summary 


Computer  vision  is  a  rapidly  expanding  field  that  depends  on  the  capability 
to  automatically  segment  and,  thus,  to  classify  and  interpret  images.  In 
this  report,  the  primary  computer  vision  subarea — segmentation — is 
investigated.  Many  of  the  latest  publications  on  the  subject  of  segmentation 
are  detailed  in  a  survey  format.  Special  attention  is  given  to  a  few  specialized 
techniques  for  segmenting  digital  images. 

Powerful  segmentation  techniques  are  available;  however,  each  technique 
is  ad  hoc.  The  creation  of  hybrid  techniques  seems  to  be  a  promising 
future  research  area  with  respect  to  current  Navy  digital  mapping 
applications. 
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A  Survey  of  Digital  Image  Segmentation  Algorithms 


1.0  Introduction 

Computer  vision  is  a  rapidly  expanding  area  that  is 
dependent  on  the  capability  to  automatically  segment, 
classify,  and  interpret  images.  Segmentation  is  central 
to  the  successful  extraction  of  image  features  and 
their  subsequent  classification.  Image  segmentation 
techniques  can  be  grouped  into  six  categories: 
amplitude  thresholding,  component  labeling, 
boundary-based  segmentation,  region-based  segmen¬ 
tation,  template  matching,  and  texture  segmentation. 

During  segmentation,  an  image  is  prcproccssed, 
which  can  involve  restoration,  enhancement,  or 
simply  representation  of  the  data.  Certain  features 
are  extracted  to  segment  the  image  into  its  key 
components.  The  segmented  image  is  routed  to  a 
classifier  or  an  image-understanding  system.  The 
image  classification  process  maps  different  regions 
or  segments,  into  one  of  several  objects.  Each  object 
is  identified  by  a  label.  The  image-understanding 
system  then  determines  the  relationships  between 
different  objects  in  a  scene  to  provide  a  complete 
scene  description. 

Powerful  segmentation  techniques  are  currently 
available;  however,  each  technique  is  ad  hoc.  The 
creation  of  hybrid  techniques  seems  to  be  a  future 
research  area  that  is  promising  with  respect  to  current 
Navy  digital  mapping  applications.  For  example, 
improved  digital  map  classification  techniques  could 
be  developed  for  automated  feature  extraction  of 
the  digitally  scanned  map  data  used  in  various  Navy 
aircraft  and  for  future  shipboard  electronic  chart 
systems. 

This  report  discusses  the  six  image  segmentation 
algorithms  by  describing  the  technique  and  com¬ 
paring  different  algorithms.  The  latest  publications 
that  describe  each  technique  arc  given  in  a  survey- 
type  format.  The  Summary  and  Conclusions  section 
examines  the  potential  applications  if  several  of  the 
techniques  are  integrated  for  developing  segmentation 
methods  that  will  specifically  address  naval 
applications. 


2.0  Amplitude  Thresholding 

Amplitude  thresholding,  or  window  slicing,  is 
useful  whenever  an  object  is  sufficiently  character¬ 
ized  by  the  amplitude  features.  Thresholding 
techniques  arc  also  useful  in  segmenting  such  binary 
images  as  printed  documents,  line  drawings,  and 
multispcctral  and  x-ray  images.  A  commonly  used 
approach  to  thresholding  follows: 

•  Examine  the  histogram  of  the  image  to  identify 
peaks  and  valleys.  If  the  image  is  multimodal,  then 
the  valleys  can  be  used  for  selecting  thresholds. 

•  Perform  thresholding  so  that  a  predetermined 
fraction  of  the  total  number  of  samples  is  below 
the  threshold. 

•  Adaptively  threshold  by  examining  local  (neigh¬ 
borhood)  histograms. 

•  Selectively  threshold  by  examining  histograms 
of  only  those  points  that  satisfy  a  chosen  criterion. 
For  example,  in  low-contrast  images,  the  histogram 
of  pixels  whose  Laplacian  magnitude  is  above  a 
predefined  value  will  exhibit  clearer  bimodal  features 
than  that  of  the  original  image. 

•  Determine  the  threshold  to  minimize  the 
probability  of  error  or  some  other  quantity,  for 
instance,  Bayes’  risk,1  if  a  probabilistic  model  of 
the  different  segmentation  classes  is  known. 

Multilevel  thresholding  is  generally  less  reliable 
than  its  single-threshold  counterpart  because 
establishing  multiple  thresholds  that  effectively 
isolate  regions  of  interest,  especially  when  the 
number  of  corresponding  histogram  modes  is  large, 
is  difficult.  Problems  of  this  nature,  if  handled  by 
thresholding,  are  best  addressed  by  a  single, 
variable  threshold. 

Mathematically,  thresholding  can  be  viewed  as 
an  operation  that  involves  tests  against  a  function 
T  of  the  form 

T-T  [x,  y  ,p  (x,  y),f(x,  y)j ,  (1) 
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where  /  ( x ,  y)  is  the  gray  level  of  point  (x,  y),  and 
p  (x,y)  denotes  some  local  property  of  this  point: 
for  example,  the  average  gray  level  of  a  neighbor¬ 
hood  centered  at  (x,  y).  It  follows  that  a  thresholded 
image  g(x,  y)  is  created  by  defining 


8  (x.  y)  = 


1  if  f(x,y)>T 
0  if  /  (x,  y)  <=  T 


(2) 


Therefore,  in  examining  g(x,  y),  pixels  that  are 
labeled  1  (or  any  other  convenient  intensity  level) 
correspond  to  objects,  and  pixels  that  are  labeled 
0  correspond  to  the  background.  When  T  depends 
only  on  /(x,  y),  the  threshold  is  called  global. 

A  simple  approach  that  is  often  useful  for 
segmenting  an  image  consists  of  dividing  the  gray 
scale  into  bands  and  using  thresholds  to  determine 
regions  or  to  obtain  boundary  points.2  Smoothing 
is  also  a  key  component  related  to  thresholding. 
The  gray-level  subpopulations  that  correspond  to 
different  types  of  regions  in  a  picture  will  often 
overlap.  Under  these  circumstances,  segmenting  the 
picture  into  regions  by  thresholding  becomes 
difficult:  wherever  the  threshold  is  placed,  the  over¬ 
lapping  subpopulations  cannot  be  cleanly  separated. 
This  problem  can  usually  be  alleviated  by  smoothing 
the  picture  before  thresholding  it.  For  example,  the 
picture  could  be  locally  averaged  by  replacing 
the  gray  level  at  each  point  with  an  average  of  the 
neighboring  pixels’  gray  levels.  Within  a  given  type 
of  region,  averaging  dampens  local  gray-level 
fluctuations  and,  hence,  reduces  the  gray-level  vari¬ 
ability  while  preserving  the  mean  gray  level. 
However,  averaging  also  blurs  the  borders  of  the 
regions;  thresholding  will  still  extract  the  regions 
more  or  less  correctly,  although  it  will  smooth  out 
irregularities  in  their  borders.3 


3.0  Component  Labeling  Segmentation 

Component  labeling  is  a  simple  and  effective 
method  of  segmenting  binary  images  by  examining 
the  connectivity  of  pixels  with  their  neighbors  and 
then  labeling  the  connected  sets.  Two  practical 
algorithms,  pixel  labeling  and  run-length  connectivity 
analysis,  are  discussed  in  the  following  sections. 

3.1  Pixel  Labeling 

Suppose  a  binary  image  is  raster-scanned  from 
right  to  left  and  from  top  to  bottom.  The  current 


pixel,  say  x,  is  labeled  as  belonging  to  either  an 
object  (with  the  pixel  value  set  to  1)  or  a  hole  (0) 
by  examining  its  connectivity  to  its  neighbors.  If 
two  or  more  qualified  objects  are  present,  then  those 
objects  arc  declared  to  be  equivalent  and  are  merged. 
A  new  object  label  is  assigned  when  a  transition 
from  a  set  of  Os  to  an  isolated  1  is  detected.  Once 
the  pixel  is  labeled,  the  features  of  that  object  are 
updated.  At  the  end  of  the  scan,  such  features  as 
the  centroid,  area,  and  perimeter  are  saved  for  each 
region  of  connected  Is. 

3.2  Run-Length  Connectivity  Analysis 

An  alternate  method  of  segmenting  binary  images 
is  to  analyze  the  connectivity  of  run  lengths  from 
successive  scan  lines.  Consider  black  and  white  runs 
denoted  A,  B,  C,  etc.  A  segmentation  table  is  created 
and  run  A  of  the  first  scan  line  is  entered  into  the 
first  column.  The  object  of  run  A  is  named  A'. 
The  first  run  of  the  next  scan  line,  B,  has  the  same 
color  as  A  and  overlaps  A.  Hence,  B  also  belongs 
to  object  A'  and  is  placed  underneath  A  in  the  first 
column.  Run  C  has  a  different  color,  so  it  is  placed 
in  a  new  column  for  an  object  labeled  B'.  Run  D 
has  the  same  color  as  A  and  overlaps  A.  Since  both 
B  and  D  overlap  A,  divergence  is  said  to  have 
occurred,  and  a  new  column  of  object  A'  is  created, 
in  which  D  is  placed.  A  divergence  flag,  ID1,  is 
set  in  this  column  to  indicate  that  object  B '  has 
caused  this  divergence.  Another  flag,  ID2  of  B' 
(column  2),  may  be  set  to  A'  to  indicate  that  object 
B  has  caused  divergence  in  overlap  with  another 
run,  U,  which  sets  the  convergence  flags  IC1  to  C' 
in  column  4  and  IC2  to  B'  in  column  6.  Similarly, 
W  sets  the  convergence  flag  IC2  to  A'  in  column  2, 
and  column  5  is  labeled  as  belonging  to  object  A'. 

In  this  manner,  all  the  objects  with  different  closed 
boundaries  arc  segmented  in  a  single  pass.  The 
segmentation  table  lists  the  data  relevant  to  each 
object.  The  convergence  and  divergence  flags  also 
give  the  hierarchy  structure  of  the  object.  Since  B' 
causes  divergence  as  well  as  convergence  in  A', 
and  since  C'  has  a  similar  relationship  with  B',  the 
objects  A',  B',  and  C'  are  assigned  levels  1,  2,  and 
3,  respectively.1 

4.0  Boundary-Based  Segmentation  and 
the  Hough  Transform 

Boundary  extraction  techniques  segment  objects 
on  the  basis  of  their  profiles.  Therefore,  such 
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techniques  as  contour  following,  connectivity,  edge 
linking,  graph  searching,  curve  fitting,  Hough 
transform,  and  others  are  applicable  to  image 
segmentation.  Difficulties  with  boundary-based 
methods  occur  when  objects  are  touching  or  over¬ 
lapping,  or  if  a  break  occurs  in  the  boundary  due  to 
noise  or  artifacts  in  the  image.1 

It  has  long  been  recognized  that  the  Hough 
Transform  (HT)  represents  near-exclusive  technique 
for  shape  and  motion  analysis  in  images  that  contain 
noisy,  missing,  and  extraneous  data.  But  its  adoption 
has  been  slow  because  of  computational  complexity 
and  storage  problems,  as  well  as  the  lack  of  a  detailed 
understanding  of  its  properties.  However,  in  recent 
years  much  progress  has  been  made  in  these  areas. 
An  efficient  implementation  of  the  HT  and  results 
on  analytic  and  empirical  performance  of  various 
methods  are  discussed  in  this  section. 

The  HT  was  first  introduced  to  detect  complex 
patterns  of  points  in  binary  image  data  by  deter¬ 
mining  specific  values  of  parameters  that  characterize 
these  patterns.  Spatially  extended  patterns  are 
transformed  to  produce  spatially  compact  features 
in  a  space  of  possible  parameter  values.  The  HT 
converts  a  difficult  global  detection  problem  in  image 
space  into  a  more  easily  solved,  local  peak-detection 
problem  in  a  parameter  space. 

When  the  HT  is  calculated  on  a  digital  computer, 
the  continuous  parameter  space  is  usually  considered 
to  be- composed  of  the  union  of  a  number  of  finite- 
sized  regions.  In  standard  implementations  the  space 
is  partitioned  into  suitably  sized,  multidimensional 
rectangles.  Each  rectangle  is  associated  with 
an  element  of  a  multidimensional  array  called  an 
accumulator  array.  The  elements  of  the  accumulator 
array  act  as  counters  and  arc  incremented  when  a 
hypersurface  from  the  back-projection  of  an  image 
point  passes  through  the  region  of  parameter  space 
associated  with  the  element.  When  several  image 
points  back-project  to  the  same  parameter 
combinations,  i.e.,  their  hypcrsurfaces  either  intersect 
or  pass  close  to  one  another,  then  the  corresponding 
array  element  accumulates  a  large  value. 

The  HT  can  be  viewed  as  an  evidence-gathering 
procedure.  Each  image  point  “votes”  for  all  parameter 
combinations  that  could  have  produced  it,  if  it  were 
part  of  the  sought-after  shape.  The  votes  are  counted 
in  the  accumulator  array,  and  the  final  totals  indicate 
the  relative  likelihood  of  shapes  described  by 
parameters  within  the  corresponding  parameter  cell. 

The  HT  is  closely  related  to  template -matching 
techniques  described  later  in  this  report.  One  obvious 


difference  is  that  tern  plate -matching  is  carried  out 
entirely  in  the  image  domain.  For  this  reason,  the 
HT  was  included  here  instead  of  in  the  template¬ 
matching  section.  Unlike  template-matching 
techniques,  the  HT  always  assumes  a  match  between 
a  given  basic  template  point  and  a  selected  image 
point,  and  then  calculates  the  transformation 
parameters  that  connect  them.  Thus,  although  the 
HT  and  template-matching  calculate  the  same 
quantity,  the  HT  is  more  efficient  because  it  does 
not  generate  unessential  data. 

HT  methods  offer  many  desirable  features.  First, 
each  image  point  is  treated  independently;  there¬ 
fore,  the  method  can  be  implemented  using  more 
than  one  processing  unit:  i.e.,  parallel  processing 
of  all  points  is  possible.  This  makes  the  HT  well 
suited  to  real-time  applications  and  to  be  a  possible 
module  for  shape  detection  in  biological  systems. 
Second,  the  HTs  independent  combination  of 
evidence  means  that  it  can  recognize  partial  or 
slightly  deformed  shapes. 

Occlusion  is  a  severe  problem  for  most  other 
shape  detection  techniques,  but  the  HT  degrades 
gracefully  because,  to  first-order  approximation,  the 
size  of  a  parameter  peak  is  directly  proportional  to 
the  number  of  matching  boundary  and  template 
points.  The  size  and  spatial  localization  of  the  peak 
provides  a  measure  of  similarity  in  shape  and  mode. 
Third,  the  HT  method  is  robust  when  random  data 
arc  introduced  by  poor  image  segmentation.  Random 
image  points  are  unlikely  to  contribute  coherently 
to  a  single  bin  of  the  accumulator  and  thus  produce 
only  a  low-level  background  of  counts  in  the  array. 

A  more  serious  problem  than  random  data  is  data 
from  the  boundaries  of  shapes  other  than  those  being 
searched  for.  These  boundaries  can  produce  struc¬ 
tured  backgrounds,  and  some  care  must  be  taken  to 
either  eliminate  or  identify  such  situations.  Finally, 
the  HT  can  simultaneously  accumulate  evidence  for 
several  examples  of  a  particular  shape  class  that 
occurs  in  the  same  image.  In  general,  each  instance 
of  the  shape  simply  produces  a  distinct  peak  or 
cluster  in  the  accumulator  array. 

The  principal  disadvantage  of  the  standard  imple¬ 
mentation  of  the  HT  is  its  large  storage  and 
computational  requirements.  The  determination  of 
q  parameters,  each  resolved  into  z  intervals,  requires 
an  accumulator  of  zq  elements,  which  can  be 
prohibitively  large  if  either  z  or  q  is  large.  The 
major  computational  cost  of  the  algorithm  is  the 
calculation  of  parameter-cell-parameter  surface 
intersections.  In  the  simplest  case,  the  parameter 
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surface  spans  (q  - 1)  of  the  q  parameter  dimensions, 
so  the  number  of  calculations  is  massive  and 
increases  exponentially  with  the  dimensionality  of 
the  problem.  The  efficiency  of  the  HT  can  be 
increased  by  devising  methods  that  use  small-sized 
accumulators  or  that  use  extra  data  to  restrict  the 
range  of  parameters  to  be  addressed. 

Early  analytical  work  on  the  properties  and 
performance  of  the  HT  concentrated  on  the  effect 
of  statistical  measurement  error  on  the  position  and 
the  localization  of  parameter  peaks.  Shapiro4-5,6 
researched  the  variance  of  parameter  estimates  as  a 
function  of  measurement  error  for  transforms  in 
which  each  image  or  feature  point  produced  a  single 
vote  in  parameter  space.  Sklansky7  suggested  a 
geometric  construction  for  straight-line  detection 
that  could  be  used  to  investigate  the  precision  of 
curves  derived  from  estimated  parameters.  This 
graphical  technique  was  extended  by  Shapiro  and 
Iannino5  to  address  the  case  of  noisy  image  mea¬ 
surements.  and  w-as  used  to  derive  results  relating 
quantization  errors  to  the  accuracy  of  parameter 
estimation.  These  guides  proved  useful  in 
determining  accumulator  quantization. 

Maitre's9  work  is  the  most  recent  concerning  the 
effects  of  random  image  noise  on  the  density  of 
counts  in  parameter  space. 

The  HT  has  proven  valuable  for  solving  many 
machine  vision  problems,  since  straight  lines  and 
simple  polygons  occur  in  most  natural  and  man¬ 
made  scenes.  For  example,  a  remotely  sensed  image 
of  an  inhabited  area  will  contain  an  abundance  of 
linear  features  (e.g.,  roads  and  railroads)  and  simple 
polygonal  features  (e.g..  buildings,  parks,  and  farm 
fields).  Even  complex  objects  can  often  be  identified 
by  their  distinctive  combination  of  these  basic 
features. 

One  of  the  main  characteristics  of  the  HT  is  that 
it  consists  of  a  series  of  fairly  simple  calculations 
carried  out  independently  on  every  feature  in  an 
image.  The  following  text  discusses  recent 
developments  in  the  implementation  of  the  HT  with 
real-time  hardware,  as  well  as  efforts  to  capture  the 
HT’s  inherent  parallelism  on  specialized  parallel 
architectures.  Most  of  these  efforts  consider  only 
the  implementation  of  the  (p,  0)  line-finding  HT. 

Hanahara  et  al.'°  implemented  a  (p,  0)  line- 
finding  HT  that  pipelines  the  p  intersection 
calculation  and  the  accumulator  increments.  They 
implemented  their  system  in  standard  TTL  (tran¬ 
sistor-transistor  logic)  medium-  and  small-scale 
integration  circuits  using  a  Motorola  MC68000  as 


the  main  processor.  The  process,  which  includes 
edge  detection,  HT  calculation,  accumulation,  and 
peak  detection.  wfas  found  to  take  0.79  seconds  for 
1024  feature  points.  Baringer11  proposed  an  archi¬ 
tecture  called  PPPE  (parallel  pipeline  projection 
engine),  which  uses  the  ideas  of  the  Radon  transform 
as  a  projection  operation  to  produce  a  real-time 
hardware  implementation  of  the  HT.  A  set  of  VLSI 
(very  large  scale  integration)  chips  is  currently  being 
designed,  and  achievement  of  real-time  implemen¬ 
tation  of  the  Radon  transform  using  only  one  or 
two  ICs  is  expected. 

The  Radon  transform  of  a  function  is  defined  as 
its  line  integral  along  a  line  inclined  at  an  angle  0 
from  the  y-axis  and  at  a  distance  from  the  origin. 
Basically,  the  Radon  transform  maps  the  spatial 
domain  to  the  distancc/angle  domain,  sometimes 
referred  to  as  the  s/9  space. 

Several  authors  have  investigated  the  implemen¬ 
tation  of  the  HT  on  currently  available  SLMD  (single 
instruction  multiple  data)  architectures,  a  type  of 
digital  computer.  These  architectures  usually  consist 
of  square  arrays  of  simple  processing  elements  (PEs) 
connected  so  that  each  can  communicate  with  its 
four  or  eight  neighbors.  All  processors  concurrently 
execute  the  same  instructions  on  different  items  of 
data. 

Li12  considered  two  schemes  for  running  his  fast 
HT  (FHT)  on  SIMD  architecture.  In  the  first  scheme, 
each  PE  is  assigned  an  image  feature,  and  the 
coordinates  of  a  parameter  cell  are  broadcast 
simultaneously  to  every  PE  by  a  central  controller. 
Each  PE  decides  whether  the  hypersurfacc  generated 
by  its  image  feature  intersects  the  cell;  if  so,  the 
PE  sends  a  vote  back  to  the  controller.  The  votes 
from  each  PE  can  be  summed  by  the  central 
controller  and  stored  for  later  analysis.  In  the  second 
scheme,  each  PE  is  assigned  a  volume  of  parameter 
space  and  the  image  features  arc  broadcast.  The 
choice  of  method  depends  on  the  number  of  available 
PEs,  the  number  of  image  features,  and  the  number 
of  parameter  cells.  For  the  standard  HT  the  number  of 
parameter  cells  increases  exponentially  with 
dimensionality  of  the  problem;  therefore,  the  first 
alternative  is  likely  to  be  the  most  feasible. 

Little  et  al.13  describe  a  possible  implementation 
of  the  HT  on  an  architecture  called  the  connection 
machine.  This  architecture  is  similar  to  the  SIMD. 
but  in  addition  to  PEs  communicating  with  near 
neighbors,  a  hardware  router  implements  rapid 
communication  between  any  pair  of  processors.  The 
architecture  is  based  on  a  12-dimensional  hypercubc 
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such  that  every  processor  can  be  reached  from  any 
other  processor  by  traversing,  at  the  most,  1 2  edges 
of  the  cube.  Their  paper  concentrates  on  aspects  of 
programming  and  addressing  but  gives  no  data  on 
the  efficiency  gained  by  using  this  parallel 
implementation. 

Guerra  and  Hambrusch14  presented  two  efficient 
algorithms  for  HT  line-finding  on  an  n  x  n  mesh 
using  the  massively  parallel  processor  (MPP).  Their 
first  method,  the  block  algorithm,  involves 
partitioning  the  mesh  into  submeshes,  performing 
projections  in  these  submeshes,  and  then  combining 
partial  results.  Their  second  method  is  similar  in 
that  it  projects  by  tracing  lines  through  the  image 
in  a  pipeline  fashion.  Although  this  tracing  algorithm 
is  asymptotically  optimal  in  terms  of  complexity, 
Guerra  and  Hambrusch  expect  the  block  algorithm 
to  outperform  it  in  actual  implementation. 

The  HT  has  attracted  attention  from  researchers 
interested  in  human  vision,  since  the  HT  is  a  prime 
example  of  the  ideas  of  the  connectionist  school  of 
artificial  intelligence.15  The  unifying  principle  of  this 
approach  to  intelligence  is  that  low-  and  medium- 
level  vision  tasks  are  done  by  massively  parallel, 
cooperative  computations  on  large  networks  of  simple 
neuron-like  units.  Low-level,  pixel-based  properties, 
such  as  edge  or  gray-level  estimates,  can  be 
represented  by  nodes  in  a  separate  parameter 
network.  Each  node  records  a  measure  of  confidence 
for  the  occurrence  of  its  feature  or  parameter  value; 
direct  connections  between  the  two  networks  define 
ways  in  which  nodes  can  influence  these  confidence 
values.  Different  connection  patterns  can  be  used 
to  impose  different  image-space  to  parameter-space 
mappings;  i.e.,  connections  can  be  established  so 
that  if  many  low-level  units  that  lie  on  a  straight 
line  have  high  confidence,  then  the  higher  level 
unit  describing  the  parameters  of  this  line  will  acquire 
a  large  confidence  value.  The  major  characteristic 
of  this  implementation  is  the  tremendous  number  of 
feature  and  parameter  units  needed  and  the  very 
large  number  of  connections  required  between  them. 

Blanford’s16  adaptation  of  the  dynamically 
quantized  pyramid  method  can  be  naturally  mapped 
to  a  parallel  pyramid  machine.  However,  one  possible 
problem  with  this  algorithm  is  that  it  requires  some 
multiplication  and  division  operations,  which  are 
inefficient  on  the  simplest  bit  serial  processors. 

Fischlcr  and  Firschcin17  showed  the  HT  to  be  an 
algorithm  that  can  be  implemented  on  a  blackboard 
or  database  architecture.  They  invoke  a  maxim,  which 
they  call  parallel  guessing,  that  says  that  it  is  often 


computationally  beneficial  to  try  to  guess  a  solution 
rather  than  to  exhaustively  compute  a  solution.  They 
suggest  computing  the  HT  incrementally  and  then 
terminating  computation  when  a  sufficiently 
significant  parameter  peak  has  been  identified.1® 
An  approach  using  statistical  signal  detection 
theory  is  effective  for  curve  detection  in  digital 
images  corrupted  by  random  noise.19  This  approach 
is  a  refinement  of  the  HT  and  results  in  improved 
performance,  both  in  deciding  the  presence  or 
absence  of  a  curve  in  the  image  and  in  determining 
the  location  of  an  existing  curve  in  the  image. 
Location  estimation  performance  is  measured  by 
deriving  equations  for  both  the  HT  and  the  signal 
detection  theory  for  the  probability  of  correctly 
estimating  the  location  of  a  curve  in  noise.  The 
performance  of  these  two  approaches  are  compared 
for  various  signal-to-noise  ratios  (SNR)  and  found 
to  be  significantly  different  for  some  SNR  values. 


5.0  Region-Based  Segmentation 

Region-based  segmentation  techniques  are 
primarily  used  to  identify  various  regions  with  similar 
features  in  one  image.  Region-based  approaches  arc 
generally  less  sensitive  to  noise  than  the  boundary- 
based  methods.  However,  they  can  be  considerably 
more  complex  to  implement.1 

Many  region-based  segmentation  techniques  are 
presented  in  this  section,  including  region-growing 
and  merging,  relaxation  labeling,  symmetric  near¬ 
est  neighbor,  hierarchical  segmentation,  and  shadow 
boundary  segmentation.  Several  well-known  image 
processing  techniques  are  described  in  the  context 
of  region-based  segmentation,  such  as  clustering, 
pattern  recognition,  edge-detection,  noise  reduction, 
and  three-dimensional  object  recognition.  An  inter¬ 
esting  application  of  region-based  segmentation  is 
discussed  last;  segmentation  of  handwritten  numerical 
strings. 

5.1  A  Note  on  Color 

The  analysis  of  color  images  has  received 
relatively  little  attention  in  computer  vision  research, 
even  though  color  plays  an  important  role  in  human 
vision  and  provides  useful  information  for  many 
image  analysis  applications.  One  simple,  powerful 
method  for  region-based  segmentation  of  color 
images  uses  edge-preserving  filters.20  The  method 
uses  a  new  measure  of  color  edge  information  based 
on  histograms  of  absolute  color  differences.  This 
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measure  can  be  used  for  smoothing,  segmentation, 
and  edge  detection.  Methods  for  multi-cdge- 
preserving  smoothing  and  region-based  segmentation 
were  also  developed.20  A  global  histogram  of  absolute 
color  (or  gray  scale)  differences  provides  a  good 
measure  of  edge  information  in  an  image,  because 
the  likelihood  that  an  absolute  color  difference  occurs 
in  the  interior  of  a  region  decreases  monotonically 
with  increasing  magnitude  of  the  difference. 

5.2  Region-Growing  and  Merging 

One  class  of  region-based  techniques  involves 
region-growing.  The  image  is  divided  into 
atomic  regions  of  constant  gray  levels.  Similar 
adjacent  regions  are  merged  sequentially  until 
the  adjacent  regions  become  sufficiently  different. 
The  crux  of  this  procedure  is  the  selection  of  the 
merging  criterion.  Some  merging  heuristics  follow: 

•  Merge  two  regions,  Ri  and  Rj,  if  w/Pm  >  01. 
where  Pm  =  min  (Pi,  Pj)\  Pi  and  Pj  are  the  perim¬ 
eters  of  Ri  and  Rj;  and  w  is  the  number  of  weak 
boundary  locations  (pixels  on  either  side  have  a 
magnitude  difference  less  than  some  threshold  y). 
The  parameter  01  controls  the  size  of  the  region  to 
be  merged.  For  example,  ©1  =  1  implies  that  two 
regions  will  be  merged  only  if  one  of  the  regions 
almost  surrounds  the  other.  Typically,  01  =  0.5. 

•  Merge  Ri  and  Rj  if  w/I  >  02,  where  /  is  the 
length  of  the  common  boundary  between  the  two 
regions.  Typically  02  =  0.75.  The  two  regions  are 
merged  if  the  boundary  is  sufficiently  weak.  This 
step  is  often  applied  after  the  first  heuristic  has 
been  used  to  reduce  the  number  of  regions. 

•  Merge  Ri  and  Rj  only  if  there  are  no  strong 
edge  points  between  them.  Note  that  the  run-length 
connectivity  method  for  binary  images  can  be 
interpreted  as  an  example  of  this  heuristic. 

•  Merge  Ri  and  Rj  if  their  similarity  distance  is 
less  than  a  threshold.  Instead  of  merging  regions, 
the  segmentation  problem  can  be  approached  by 
splitting  a  given  region.  For  example,  the  image 
could  be  split  by  a  quad-tree  approach  and  then 
similar  regions  could  be  merged. 

5.3  Hierarchical  Segmentation 

Gambotto’s21  hierarchical  segmentation  algorithm 

can  process  all  regions  (in  a  region-growing  manner) 
in  a  parallel  and  recursive  fashion.  This  algorithm 
simultaneously  computes  the  statistical  properties 
of  homogeneous  regions,  as  well  as  a  gradient 
estimate  over  the  boundaries  of  the  regions  to  detect 
the  contours. 


The  algorithm  was  applied  to  a  synthetic  image 
that  contained  four  regions;  each  region  was  obtained 
by  adding  a  pseudorandom  Gaussian  noise  to  a 
constant  value;  the  variance  of  the  Gaussian  noise 
was  equal  to  16,  and  its  dynamic  was  190.  The 
algorithm  produced  excellent  results,  considering 
the  noise  in  the  image.  The  results  of  Gambatto’s 
report  are  given  in  two  phases:  the  noisy  image  is 
first  passed  through  an  initial  segmentation  and  then 
through  a  final  segmentation  phase. 

5.3.1  Segmenting  Contour  Line  Images 

The  Gorman  and  Weill22  segmentation  algorithm 
groups  contour  lines  into  regions.  Their  method  is 
based  in  part  on  a  parallel-adjacency  criterion,  which 
is  defined  in  their  paper.  The  algorithm  was  applied 
to  several  contour  line  images,  and  the  resultant 
regions  were  given.  The  apparent  key  to  this 
algorithm  is  the  way  in  which  it  combines  image 
properties  to  recognize  line  regions.  The  main  steps 
in  segmenting  contour  line  images  with  this  algorithm 
follow: 

»  Split  lines  at  all  junctions  (bifurcations  and  line 
crossings). 

•  Perform  piecewise  straight-line  fitting  so  that 
each  line  is  comprised  of  straight-line  segments. 

•  Construct  an  adjacency  list  of  the  straight-line 
segments.  This  segment  adjacency  list  (SAL) 
contains,  for  each  segment,  all  other  segments  that 
meet  the  criteria  for  proximity,  approximate 
parallelism,  and  nonzero  overlap  with  respect  to 
that  segment. 

•  Merge  the  segments  of  the  SAL  into  groups  on 
the  basis  of  pairwise  similarity  of  line  segments 
due  to  the  parallel-adjacency  criterion.  The  result 
is  the  segment  group  list  (SGL). 

•  Consider  each  line  in  its  entirety  (made  up  of 
the  straight  line  segments)  and  group  the  lines,  again 
in  pairwise  fashion,  based  on  line  adjacency  and 
similar  composition  of  line  segments  from  the  SGL. 
The  result  is  the  line  region  list  that  contains  line 
composition  of  each  contour  line  region. 

The  performance  of  this  algorithm  is  dependent 
on  six  parameters:  maximum  distance  tolerance 
between  adjacent  segments,  minimum  overlap 
tolerance  between  adjacent  segments,  maximum 
angular  tolerance  between  adjacent  segments, 
minimum  number  of  segments  per  group  and  lines 
per  region,  and  average  line  spacing.  The 
experimental  results  show  that  the  regions  determined 
by  the  algorithm,  when  applied  to  both  synthetic 
and  real  images,  arc  consistent. 
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5.3.2  Double  Hierarchy  of  Fusion 

Gagalowicz  and  Monga23  report  a  method  of 
region-growing  that  defines  a  double  hierarchy 
of  fusion  of  adjacent  region  pairs.  The  first  level  of 
the  hierarchy  is  defined  by  the  increasing  values 
of  a  merge  criterion  and  the  second  one  by  the 
order  of  the  fusion  criteria  used  successively. 

This  double  hierarchy  allows  the  algorithm  to 
create  sequentially  more  and  more  global  segmen¬ 
tations.  They  showed  that  it  was  easy  to  introduce 
semantic  fusion  criteria  due  to  the  richness  of  the 
information  available  at  each  iteration.  The  algorithm 
did  not  derive  its  strength  from  the  classical  criteria 
used,  but  from  the  merge  strategy,  the  order,  and 
the  succession  of  the  various  criteria. 

5.4  Global-Local  Edge  Coincidence 

Hall24  notes  that  segmentation  may  be  either  edge- 
based  or  region-based.  The  edge-based  method  works 
well  but  is  sensitive  to  local  noise.  A  so-called  global 
method  is  more  stable,  since  it  considers  the  over¬ 
all  characteristics  of  the  image;  however,  false 
regions  may  often  be  detected.  For  perfect  region 
segmentation,  the  global  region  boundaries  should 
coincide  with  the  local  edges.  The  global-local  edge 
coincidence  (GLEC)  segmentation  method  detects 
the  coincidence  of  the  region  boundaries  (or  global 
edges)  and  local  edges.  Since  the  global  edges  are 
obtained  from  the  global  characteristics  of  the  image 
(for  instance,  the  histogram  of  the  intensity  of  the 
image),  the  local  noise  edges  will  not  be  detected 
in  the  global  edge  map;  however,  perfect  region 
segmentation  is  barely  obtained  by  using  only  the 
intensity  information.  Basically,  GLEC  is  a  merge- 
oriented  region  segmentation  method.  Two  regions 
arc  merged  if  the  common  boundary  between  these 
regions  does  not  match  the  local  edges. 

5.5  Implementing  Data  Structures  in 
Pixel-Based  Segmentation 

Region-based  segmentation  methods  require  the 
use  of  other  data  structures  in  addition  to  the  original 
pixel  array.  For  merging,  we  can  represent  the  regions 
(at  a  given  stage  of  the  merging  process)  as  nodes 
on  a  graph,  with  pairs  of  nodes  joined  by  arcs  if 
the  corresponding  regions  arc  adjacent.  The  statistics 
associated  with  each  region  can  be  stored  at  its 
node;  if  it  is  possible  to  compute  the  statistics  for 
a  (tentatively)  merged  pair  of  regions  directly  from 
those  for  the  individual  regions,  the  merging  process 
can  be  done  directly  on  the  graph,  without 


reaccessing  the  original  image.  For  splitting,  the 
quadrants,  subquadrants,  etc.,  can  be  represented 
at  a  given  stage  of  the  process  as  nodes  of  a  quadtree, 
where  the  root  is  the  whole  image  and  the 
quadtree  nodes  correspond  to  its  quadrants.  Here, 
it  is  necessary  to  refer  to  the  original  image  to 
compute  the  statistics  of  the  subregions  each  time 
a  region  is  split. 

The  fact  that  region-based  segmentation  methods 
may  require  access  to  the  image  data  in  an  arbitrary 
order  is  a  potential  disadvantage  when  the  image 
must  be  accessed  from  peripheral  storage.  Thus, 
such  methods  are  best  applied  to  small  images. 
However,  region-based  methods  do  have  a  potential 
advantage:  in  principle,  they  can  be  designed  to 
incorporate  information  about  the  types  of  regions 
(sizes,  shapes,  colors,  textures,  etc.)  that  are  expected 
to  occur  in  images  of  a  given  class;  thus,  merging 
or  splitting  can  be  inhibited  if  either  would  violate 
restrictions  on  the  expected  types  of  regions.  As  a 
classic  example,  a  region-based  approach  can  be 
used  to  “grow”  or  “track”  global  edges  (or  curves) 
in  an  image,  starting  from  pixels  that  have  high 
edge  magnitudes  and  accepting  new  pixels  (i.e., 
merging  them  with  the  edge  fragments  already 
constructed)  if  they  continue  along  these  edges. 

Another  advantage  of  these  so-called  pixel-based 
segmentation  methods  over  normal  region- 
based  segmentation  is  that  pixel-based  schemes  can 
be  greatly  accelerated  if  parallel  hardware  is 
available.  This  rapidity  is  accomplished  by  dividing 
the  image  into  parts  and  assigning  a  separate 
processor  to  segment  each  part;  the  processors  can 
share  global  information  about  segmentation  criteria, 
if  desired,  and  they  may  also  share  neighbor 
information  along  the  common  borders  of  the  parts. 
In  principle,  parallelism  could  also  be  used  in  region- 
based  schemes  by  assigning  processors  to  regions 
or  to  sets  of  regions,  but  this  method  would  require 
an  extremely  flexible  interprocessor  communication 
scheme  to  allow  processors  that  contain  information 
about  adjacent  regions  to  communicate.  In  pixel- 
based  schemes,  however,  the  image  can  be  divided 
into  square  blocks,  for  instance,  so  that  the  processor 
responsible  for  a  given  block  needs  only  to 
communicate  with  a  limited  number  of  processors 
that  are  responsible  for  neighboring  blocks.25 

5.6  Clustering 

Clustering  refers  to  a  class  of  algorithms  used 
extensively  for  image  segmentation.  Clustering 
assembles  unlabeled  data  by  sets,  or  clusters,  of 
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data  points  with  strong  internal  similarity.  Data  point 
values  represent  characteristic  features  of  interest 
such  as  grayscale,  color  brightness,  contrast,  etc. 
During  the  cluster  operation,  the  clusters  are  assigned 
labels  that  arc  mapped  back  into  the  image,  so  that 
the  original  pixel  values  are  replaced.  These  labels 
can  be  thought  of  as  “class  membership”  indicators. 
Similarity  is  most  commonly  measured  by  a  distance 
function  in  feature  space.  It  is  generally  desirable 
to  make  this  function  independent  of  any  relevant 
image  transformations  being  performed  (e.g., 
rotation,  translation,  or  scaling).  A  criterion  function 
is  also  used  to  measure  the  clustering  quality  of 
any  given  partition  of  the  image  function  values. 

The  basic  clustering  operation  examines  each  pixel 
individually  and  assigns  it  to  the  cluster  that  best 
represents  the  value  of  its  characteristic  vector.  This 
assignment  is  done  according  to  the  selected  measure 
of  similarity  between  the  data  point  and  the  criterion 
function  that  measures  clustering  quality.  The  process 
is  repeated,  if  necessary,  until  some  condition  is 
satisfied  by  the  current  grouping  of  data  points. 
For  example,  if  similarity  between  pixels  is  measured 
in  terms  of  the  distance  between  the  value  of  initial 
cluster  centers,  then  the  cluster  centers  are  assigned 
the  initial  values  Ml  =  M  -S  and  M2  =  M  +  S, 
where  M  is  the  mean  feature  vector  as  measured 
over  the  entire  image  and  S  is  the  standard  deviation. 
Clustering,  then,  would  be  achieved  in  the  following 
steps: 

°  Assign  feature  vectors  to  closest  cluster  centers. 

•  Compute  new  cluster  centers. 

°  Compare  new  and  old  cluster  centers:  if  they 
are  close  enough,  then  terminate  the  algorithm;  if 
not,  then  iterate  the  procedure  from  the  second  step. 

The  following  issues  must  be  considered  during 
clustering: 

•  The  choice  of  a  similarity  measure. 

•  The  choice  of  a  criterion  function. 

•  Determination  of  the  appropriate  number  of 
clusters. 

•  Establishing  properties  of  solutions. 

5.6.1  Relaxation  Labeling 

Relaxation  labeling  provides  an  improvement  to 
the  traditional  clustering  technique.  Instead  of 
mapping  a  single  cluster  label  back  to  each  image 
point,  the  probability  that  an  image  point  belongs 
to  each  of  the  clusters  is  mapped  back  to  the  image. 
A  relaxation  process  is  applied  where  similar  labels 
will  support  each  other,  whereas  different  labels  will 


compete  over  neighborhoods.  The  probabilities  arc 
iteratively  updated  until  convergence  is  reached. 

Relaxation  was  first  introduced  by  Rosenfcld  and 
Kak,3  who  define  it  as  an  iterative  approach  to 
segmentation.  The  approach  makes  fuzzy  or  proba¬ 
bilistic  classification  “decisions”  at  every  point  in 
parallel  and  at  each  iteration.  It  then  adjusts 
these  decisions  at  successive  iterations  based  on  the 
decisions  made  at  the  preceding  iteration  for 
neighboring  points.  The  technique  is  called  relaxation 
because  it  resembles  a  class  of  iterative  numerical 
methods.  The  approach  is  order-independent  and 
can  be  greatly  accelerated  by  parallel  processing. 
Since  each  iteration  is  parallel,  only  a  few  iterations 
are  usually  necessary.  Relaxation  is  more  powerful 
than  one-shot  parallel  methods,  since  its  initial 
classifications  are  refined  at  each  iteration,  based 
on  the  local  context.  This  approach  makes  tentative 
classifications  at  each  stage  and  repeatedly 
reconsiders  them,  unlike  other  methods  that  usually 
make  decisions  only  once  at  each  point  (except  in 
cases  where  sequential  methods  allow  backtracking). 

Relaxation  labeling  estimates  the  relative 
likelihoods  of  nodes  in  a  graph  and  then  reduces 
the  labeling  ambiguities  in  an  image.  The  problem 
can  be  formulated  by  defining  the  following:  a  set 
of  nodes,  a  set  of  labels  for  each  node,  an  initial 
assignment  of  probabilities  for  the  labels  of  each 
node,  a  set  of  arcs  between  nodes  to  indicate 
neighboring  relations,  a  constraint  relation  between 
node  labels,  and  an  updating  rule  to  refine  the 
probabilistic  assignment  of  labels. 

Some  problems  associated  with  relaxation  labeling 
include  the  choice  of  an  appropriate  updating  formula 
for  the  probabilities,  difficulty  in  asserting 
convergence,  and  difficulty  in  establishing  properties 
of  the  solution.26 

Kittler  and  Illingworth27  reviewed  various 
relaxation  labeling  algorithms  that  detailed  the  need 
to  incorporate  contextual  information  into  the 
interpretation  of  objects.  Their  literature  review 
highlighted  the  following  technique.  Ullmann  {Trans. 
IRE(IT)  8(5):74-8 1 ,  1962)  exploited  constraints 
imposed  by  triplets  of  pattern  primitives  to 
substantially  reduce  the  errors  that  occur  with  a 
pattern  recognition  system  after  a  learning  sequence 
of  fixed  length.  Gowes  ( Artificial  Intelligence  2:79- 
116,  1971)  and  Huffman  ( Machine  Intelligence 
6:295-323,  1971)  used  constraints  between  straight- 
line  segments  to  eliminate  nonsensical  interpretations 
of  an  ideal  line  drawing  representing  a  set  of 
polyhedra. 
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The  pioneering  work  in  relaxation  labeling  is 
normally  credited  to  Waltz  ( The  Psychology  of 
Computer  Vision,  McGraw-Hill,  1957),  who 
considered  the  problem  of  line-drawing  interpretation 
studied  earlier  by  Clowes  and  Huffman.  His 
formulation  of  the  consistent  labeling  problem 
allowed  only  unambiguous  interpretation  of  line 
segments,  achieved  by  sequentially  filtering  out 
inconsistent  label  pairs  of  connected  segments.  This 
approach  was  then  popularized  by  Roscnfeld  et  al. 
(. IEEE  Transactions  SMC  6(6):420-433, 1976),  who 
showed  that  Waltz’s  filtering  can  be  carried  out  in 
parallel  and  could  therefore  be  implemented  as  a 
network  of  processors,  each  associated  with  one 
object  in  the  image. 

However,  the  problem  considered  by  Waltz  is 
somewhat  unrealistic,  since  no  information  as  to 
the  identity  of  each  line  segment  is  assumed  to  be 
available.  In  practice,  when  analyzing  real  imagery, 
it  is  reasonable  to  assume  that  some  useful  infor¬ 
mation  could  be  extracted  from  the  raw  image  data. 
Conversely,  the  edge  representation  of  a  scene  is 
unlikely  to  look  like  an  ideal  line  drawing.  Roscnfeld 
et  al.  argued  that  the  line-drawing  interpretation  is 
better  formulated  in  the  continuous  domain  than  as 
a  discrete  relaxation,  even  though  the  latter  enforces 
unambiguous  labeling.  Fuzzy  set  and  probabilistic 
frameworks  were  considered  in  this  respect,  but  the 
latter  seems  to  have  attracted  the  most  attention  to 
date. 

Relaxation  has  been  applied  to  general  cases  of 
improving  multi-label  classification  of  multispec- 
tral  data,  especially  remotely  sensed  data  and  color 
images.  Most  of  these  applications  have  been 
approached  in  the  same  way.  Model  clusters  arc 
defined  in  the  measurement  hyperspace  either  by 
automatic  clustering  or  by  hand  segmentation  of 
ground  truth  data.  The  initial  label  probabilities  are 
calculated  as  a  simple  function  of  distance  between 
the  models  and  the  pixel  data.  The  interrelation¬ 
ships  among  labels  are  derived  empirically  from 
ground  truth  data  by  measuring  and  globally 
averaging  transitional  probabilities,  correlations,  or 
the  mutual  information  of  local  pixels.  Upon  applying 
these  methods,  most  investigators  reported  a  sharp 
initial  decrease  of  several  percent  in  classification 
error  rates,  followed  by  a  smaller  increase  as  the 
process  converges  to  a  stable  solution. 

Relaxation  labeling  has  been  well  used  in  matching 
problems,  such  as  two-dimensional  (2-D)  shape 
matching  or  stereo  correspondence.  Representative 
feature  points,  such  as  comers,  arc  extracted  from 


a  template  shape  and  a  corresponding  real-world 
image.  Initial  probabilities  can  be  assigned  on  the 
basis  of  the  degree  of  match  between  these  chosen 
features,  and  then  these  probabilities  can  be  itera¬ 
tively  reinforced  on  the  basis  of  the  occurrence  of 
matches  between  other  features  on  the  template  and 
their  respective  real-world  features. 

The  stereo  correspondence  problem  is  similar. 
Features  are  extracted  in  two  images;  the  features 
of  one  image  can  be  regarded  as  nodes  of  a  graph, 
and  the  features  in  the  second  image  are  possible 
labels  for  the  nodes.  The  initial  probabilities  of  labels 
is  a  simple  function  of  the  distance  between  node 
and  label  points  in  the  two  images.  Node  and  label 
assignments  are  then  compatible  if  similar  neigh¬ 
borhood  stales  exist  in  both  images.  Although  these 
relaxation  schemes  might  not  have  great 
computational  advantages  relative  to  other  standard 
matching  methods,  they  arc  more  tolerant  of  image 
distortion.  In  addition,  the  reinforcing  processes  are 
local,  so  missing  matches  can  be  tolerated;  therefore, 
occluded  objects  can  be  recognized.20 

5.7  Symmetric  Nearest  Neighbor 

A  powerful  symmetric  nearest  neighbor  (SNN) 
filter  can  be  used  for  edge-preserving  smoothing  of 
gray-scale  images.  It  uses  both  spatial  and  nearest- 
neighbor  constraints  on  image  pixels  to  smooth  an 
image.  To  compute  the  gray  value  for  the  center 
pixel  in  a  neighborhood,  it  selects  half  the  number 
of  pixels  in  the  neighborhood:  from  each  pair  of 
pixels  located  symmetrically  on  opposite  sides  of  the 
center  pixel,  the  one  that  is  closer  in  gray  value  to 
the  center  pixel  is  selected.  In  case  of  tied  pairs,  the 
mean  of  the  pair  is  used.  Then  the  mean  value  of 
those  selected  is  substituted  for  the  original  value. 

To  find  SSNs  for  a  multiband  image,  the  following 
procedure  was  proposed: 

•  Compute  the  multidimensional  cumulative 
histogram  of  absolute  color  differences. 

•  Compute  the  absolute  color  differences  between 
the  two  pixels  in  the  pair  and  the  central  pixel,  for 
each  symmetric  pair  of  neighbors  in  a  neighbor¬ 
hood. 

The  pixel  with  the  higher  frequency  in  the 
cumulative  histogram  (smaller  color  difference)  is 
selected.  In  case  of  ties,  the  mean  of  the  symmetric 
pair  is  used.  The  mean  of  the  values  of  the  set  of 
pixels  selected  is  assigned  to  the  center  pixel  on 
each  band. 

The  color-SNN  filter  can  be  iterated,  and  it 
converges  without  producing  artifacts;  normally  only 
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minute  changes  occur  in  the  image  after  two  to 
three  iterations.  The  hardware  implementation  of 
the  color- SNN  using  a  3  x  3  neighborhood  should 
be  almost  as  straightforward  as  in  case  of  the  basic 
SNN  filter. 

5.8  Connected  Components  Algorithm 

Color  segmentation  combines  edge-preserving 
smoothing  with  a  simple  connected  components  (CC) 
algorithm.  Using  the  CC,  adjacent  pixels  are  said 
to  be  connected  if  the  likelihood  or  frequency  of 
the  color  difference  is  large  (so  the  magnitude  of  the 
difference  is  small).  The  algorithms  make  use  of 
the  combined  information  in  a  two-band  image. 

First,  the  image  is  smoothed  by  the  color-SNN 
filter.  Normally,  about  three  iterations  of  3  x  3 
filtering  arc  needed  to  sharpen  edges  and  smooth 
homogeneous  areas.  To  make  edges  even  sharper 
and  to  avoid  mismerging  of  regions  ai  some  critical 
points,  bands  arc  edge-enhanced  with  a  gray-scale 
filter  known  as  a  MINRANGE  filter.  The 
MINRANGE  replaces  the  center  value  of  a  3  x  3 
neighborhood  with  the  mean  of  the  4-pixel  comer 
subgroup  that  has  the  smallest  range.  Because  sharp¬ 
ening  is  applied  to  almost  completely  smoothed 
bands,  no  artifacts  are  generated. 

After  the  color  image  is  smoothed,  it  is  segmented 
by  a  two-pass  CC  algorithm.  Here,  adjacent  pixels 
are  said  to  be  connected  if  the  likelihood  or  frequency 
of  their  absolute  color  differences  are  sufficiently 
large.  The  only  parameter  is  a  threshold,  which  is 
expressed  as  a  percentile  of  frequencies  supplied 
by  the  user. 

The  two-band  histogram  of  absolute  color 
differences  is  then  computed.  It  is  first  converted 
to  a  two-band  histogram  of  cumulative  frequencies 
and  then  to  percentiles  of  their  distribution. 

The  two  passes  of  the  CC  algorithm  are  the  same 
as  those  of  the  standard  algorithm  for  binary  images. 
Row  by  row,  pixels  are  assigned  labels  by  compar¬ 
ing  each  pixel  with  the  four  adjacent  pixels  above 
or  to  the  left,  which  have  already  been  labeled  as 
the  image  is  scanned  from  top  to  bottom  and  from 
left  to  right.  Then,  in  the  second  pass,  the  pixels 
with  component-equivalent  labels  are  relabeled 
uniquely. 

Therefore,  the  CC  algorithm  provides  a  new 
measure  of  edge  information  for  color  images  based 
on  cumulative  histograms  of  absolute  color 
differences.  CC-based  methods  may  be  used  for  edge- 
preserving  smoothing  and  segmentation.  Such 
methods  arc  relatively  simple:  they  do  not  have  to 


process  many  parameters,  and  they  give  good  results 
for  many  different  types  of  images  even  when  using 
the  same  set  of  parameter  values  for  these  different 
images. 

5.9  Munsell  Color  Coordinate  System 

Tominaga21  presents  a  method  for  segmenting  a 
color  image  into  meaningful  regions  using  three 
different  perceptual  attributes  of  color:  hue,  light¬ 
ness.  and  saturation.  This  segmentation  technique 
is  based  on  a  recursive  thresholding  method  using 
three  histograms,  one  to  depict  the  range  of  each 
attribute.  The  Munsell  color  coordinate  system  (hue, 
value  and  chroma)  is  the  color  space  used  to  best 
represent  human  color  perception.  This  color 
specification  method  predicts  the  color  perception 
of  a  measured  image.  A  practical  segmentation 
procedure  is  then  presented.  A  set  of  subregions 
with  uniform  color  is  extracted  from  the  recursive 
thresholding  on  the  peak  of  the  histogram  set.  This 
operation  is  repeated  to  generate  a  sequence  of 
uniform  color  regions  in  the  image. 

Tominaga  describes  the  segmentation  procedure 
in  six  key  steps: 

1.  Histograms  are  computed  forcach  attribute  of 
hue,  value,  and  chroma,  using  either  the  entire  image 
as  one  region  or  using  specified  regions  within  the 
image.  The  histograms  arc  smoothed  by  a  moving 
average  to  eliminate  small  peaks. 

2.  The  most  significant  peak  is  found  in  the  set 
of  histograms.  Peak  selection  is  based  on  a  shape 
analysis  performed  on  each  peak  in  the  histograms. 
First,  some  clear  peaks  are  isolated  as  candidates. 
Next,  the  following  criterion  function  (/)  is  calculated 
for  each  candidate  peak: 

/=  Sp/Ta  (100 /FWHM)  ,  (3) 

where  Sp  represents  a  peak  area  between  two  valleys, 
VI  and  V2  (the  lower  and  upper  bounds, 
respectively),  and  FWHM  is  the  full  width  of  the 
peak  at  half-maximum.  Ta  denotes  the  overall  area 
of  the  histogram,  that  is.  the  total  number  of  pixels 
in  the  specified  image  region. 

3.  Thresholding  of  a  color  image  is  executed  using 
two  threshold  values  derived  from  the  lower  bound 
VI  and  the  upper  bound  V2  for  the  most  signifi¬ 
cant  peak  in  the  set  of  three  histograms.  This 
thresholding  operation  partitions  an  image  region 
into  two  sets  of  subregions.  One  set  consists  of 
subregions  corresponding  to  the  color  attributes 
within  the  threshold  limits;  the  other  is  a  set  of 
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subregions  with  the  remaining  attribute  values.  Only 
the  former  set  is  extracted. 

4.  The  thresholding  process  is  repeated  for  the 
extracted  subregions.  The  area  of  subregions 
decreases  with  each  succeeding  threshold.  This 
process  leads  to  the  detection  of  the  most  significant 
cluster.  If  all  the  histograms  become  monomodal, 
then  the  cluster  detection  is  finished.  One-step 
segmentation  is  thus  completed,  and  a  suitable  label 
is  assigned  to  the  latest  extracted  subregions. 

5.  The  image  labeled  by  the  above  segmentation 
is  smoothed  on  the  basis  of  pixel  connectedness. 
This  refinement  is  intended  to  smooth  out  noisy 
boundaries  and  eliminate  small  regions  and  short 
lines.  The  8-connection  property  is  used  in  this 
smoothing  algorithm.  The  operation  uses  multilevel, 
rather  than  binary,  smoothing. 

6.  Steps  1  through  5  are  repeated  for  the  remaining 
regions.  The  segmentation  process  is  terminated  when 
one  area  is  sufficiently  small  in  comparison  to  the 
original  image  size,  or  when  no  histogram  has 
sufficient  peaks.  The  remaining  unlabelcd  pixels 
are  regarded  as  noisy  fluctuations  and  are  merged 
into  neighboring  labeled  regions  of  similar  colors. 
The  mean  values  of  the  color  specifications  are 
computed,  and  a  color  difference  formula  is  used 
to  choose  the  nearest  color  region. 

This  method  has  been  developed  for  segmenting 
a  color  image  into  regions  with  perceptually  uniform 
colors  by  means  of  the  three  Munsell  color  attributes. 
The  color  specification  process  was  first  presented 
to  predict  color  perception  of  measured  images. 
Experimental  results  presented  by  Tominaga 
demonstrate  the  feasibility  of  this  method. 

5.10  Pattern  Recognition  Using  the 
Commission  Internationale  de  L’  Eciairage 
Color  System 

A  new  computational  pattern  recognition  technique 
is  being  used  by  Cclenk  and  Smith29  to  segment 
color  images  of  natural  scenes.  This  technique  is 
an  unsupervised  operation  that  detects  image  clusters 
using  one-dimensional  (1-D)  histograms  of  the  color 
or  feature  coordinates  in  the  selected  uniform 
color  system  or  constructed  feature  space.  The 
detected  clusters  arc  then  extracted  by  projecting 
and  separating  the  classes  two  at  a  time.  This  method 
tends  to  integrate  the  statistical  data  analysis  concept 
of  pattern  recognition  theory  with  the  fundamental 
premises  of  human  color  perception.  It  does  not 
operate  blindly:  an  image  segment  is  accepted  only 


after  all  of  its  spectral  neighbors  in  the  uniform 
color  space  are  considered.  Although  this  algorithm 
was  developed  as  an  unsupervised  operation,  it  can 
be  implemented  in  a  supervised  mode  by  introducing 
external  knowledge  or  training  prototypes  to  the 
system. 

The  authors’  recursive  procedure  tends  to  integrate 
the  fundamental  methods  of  human  color  perception 
with  the  statistical  data  analysis  concept  of  math¬ 
ematical  pattern  recognition.  It  detects  image  clusters 
efficiently  and  determines  their  boundaries  correctly 
in  the  CIE  uniform  color  system  ( L *,  a*,  b *),  which 
is  selected  as  the  feature  space.  With  each  iteration, 
the  algorithm  detects  the  most  prominent  cluster  or 
mode  and  all  of  its  spectral  neighbors  in  the  uniform 
color  space.  To  make  the  detection  process  compu¬ 
tationally  efficient,  the  procedure  first  approximates 
the  underlying  clusters  with  circular-cylindrical 
volume  elements.  This  approximation  provides  the 
best  estimates  of  the  3-D  color  distributions  for 
these  clusters  in  accordance  with  the  color  perception 
mechanism  of  the  human  eye.  The  boundaries  of 
each  volume  element  arc  composed  of  two  constant 
luminance  planes,  two  constant  circular  chroma 
cylinders  (or  loci),  and  two  constant  angular  hue 
planes  (or  loci).  Each  is  derived  from  the  sequentially 
constructed  1-D  zero-,  first-,  and  second-order 
parametric  histograms  of  the  cylindrical  coordinates 
(i.e.,  L*  =  lightness.  H*  =  hue,  C*  =  chroma)  of  the 
uniform  color  space.  To  extract  the  image  region 
that  correctly  corresponds  to  the  most  prominent 
cluster  (i.e.,  without  significantly  deforming  its 
spatial  configuration  in  the  image  domain),  the  color 
vectors  lying  within  the  range  of  the  3-D  color 
distributions  of  this  mode  and  one  of  its  spectral 
neighbors  are  projected  onto  a  line  so  that  the 
projected  color  points  are  well  separated  and 
clustered  for  1-D  thresholding.  This  projecting 
and  thresholding  process  is  repeated  until  the  selected 
mode  is  isolated  from  all  of  its  neighbors  in  the 
uniform  color  space.  The  orientation  of  the  line 
used  for  every  projection  is  determined  according 
to  the  Fisher  criterion  postulated  as  the  measure  of 
effectiveness  of  a  linear  discriminate  function. 

Celenk’s  and  Smith’s  technique  minimizes  the 
error  rate  associated  with  region  isolation  in 
the  feature  space.  The  computational  cost  involved 
in  region  (segment)  extraction  and  cluster  detection 
is  significantly  reduced  by  using  only  1-D  image 
histograms. 

The  1976  CIE  (L*.  a*,  b*)  uniform  color  system 
used  in  Cclcnk’s  and  Smith’s  procedure  was  chosen 
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as  the  feature  space  to  best  approximate  human  vision 
perception.  This  system  implicitly  satisfies  the 
condition  that  numerical  differences  in  the  feature 
space  should  be  directly  proportional  to  perceptual 
differences  in  the  human  visual  system. 

5-11  Motion-Based  Segmentation 

Bouthemy  and  Rivero30  present  a  new  approach 
to  the  motion-based  segmentation  problem.  The 
designed  formalism  includes  2-D  motion  models 
and  relies  on  explicit  partial  motion  information 
through  a  stochastic  approach,  which  allows  the 
computation  of  a  likelihood  ratio  test  embedded  in 
a  split-and-mcrge  procedure.  Therefore,  regions  are 
structured  according  to  motion  homogeneity  criteria 
considered  in  a  hierarchical  way:  segmentation  starts 
with  as  simple  a  motion  model  as  possible,  and 
after  a  complete  iteration  cycle,  a  more  elaborate 
motion  model  can  be  taken  into  account  (e.g.,  a 
linear  one). 

The  authors  developed  a  new  region-based 
segmentation  method  that  relies  on  motion 
information  and  follows  a  stochastic  approach.  The 
criterion  for  a  spatio-temporal  homogeneity  decision 
is  the  computation  of  a  proper  likelihood  ratio  test 
based  on  some  motion  model.  They  present  a  constant 
motion  model  and  a  linear  model  in  a  hierarchical 
manner  within  a  split-and-merge  procedure. 

5.12  Edge  Detection  and  Noise  Reduction 
5.12.1  Statistical  Theories  of  Edge  Detection 

Edge  detection  is  critical  to  segmentation  and. 
thus,  to  computer  vision,  since  edges  are  essential 
to  the  segmenting  of  regions.  Results  of  Huang’s 
and  Tseng’s31  research  on  edge  detection  are  noted 
here.  They  use  a  statistical  theory  of  hypothesis 
testing  to  apply  filtering  and  edge  detection 
simultaneously  to  a  noisy  image.  A  simple  decision 
rule  is  derived,  and  the  application  of  this  result  to 
more  complicated  situations  is  discussed  in  detail. 
The  decision  rule  can  decide  whether  there  is  an 
edge,  a  line,  a  point,  a  comer  edge,  or  just  a  smooth 
region  in  a  given  small  neighborhood.  Computa¬ 
tional  by-products  of  the  decision  rule  are  the  mean 
and  the  variance  of  the  neighborhood,  which  can 
be  used  for  split  and  merge  analysis.  Utilization  of 
the  mean  can  essentially  filter  the  neighborhood 
pixels.  Huang’s  and  Tseng’s  new  technique  was 
implemented  to  run  on  a  VAX-11/780,  and  their 
experimental  results  indicate  a  high  feasibility  for 
the  method.  The  generalization  to  more  generic  cases 


is  also  discussed:  Haralick’s  sloped-facet  model 
seems  to  be  the  most  suitable  case.  The  decision 
rules  used  by  Huang  and  Tseng  arc  computationally 
intensive,  but  besides  simply  detecting  edges  they 
also  yield  line  and  point  detection,  as  well  as  good 
estimates  of  the  region’s  mean  and  variance  for 
further  analysis. 

5.12.2  Markov  Random  Fields 

Goutsias  and  Mendel32  also  address  the  issue  of 
noisy  images.  They  use  a  doubly  stochastic  image 
model  and  assume  that  the  image  is  the  sum  of 
the  realizations  of  two  independent  random  fields:  the 
uncorrupted  image  and  the  noise  field,  both  of  which 
consist  of  independent,  identically  distributed, 
Gaussian  random  variables.  Their  image 
segmentation  technique  represents  the  image  with 
a  semi-Markov  random  field  that  has  been  corrupted 
by  additive  white  noise.  Markov  random  fields 
(MRFs)  arc  2-D,  noncausal,  Markovian  stochastic 
processes.33  Goutsias  and  Mendel  develop  an 
adaptive  Bayesian  parameter  estimation/image 
detection  algorithm  to  estimate  the  unknown 
image  and  its  underlying  parameters  in  an  optimal 
manner.  Their  proposed  algorithm  is  demonstrated 
during  the  smoothing  and  segmenting  of  two 
4-gray-lcvel  real  images.  The  semi-MRF  is  defined 
in  terms  of  two,  not  necessarily  independent,  random 
fields:  an  MRF  that  describes  the  statistical  behavior 
of  the  boundary  pixels  (pixels  that  are  located  at 
the  boundary  between  adjacent  regions  of  the  image), 
and  a  random  field  that  describes  the  statistical 
behavior  of  the  regional  pixels  (pixels  that  arc  located 
inside  a  region)  of  the  image. 

Another  use  of  MRFs  is  documented  by  Murray 
and  Buxton,34  who  present  results  of  computer 
experiments  with  an  algorithm  to  perform  scene 
decomposition  and  motion  segmentation  from  visual 
motion  or  optic  flow.  The  maximum  a  posteriori 
(MAP)  criterion  is  used  to  formulate  the  best 
segmentation  or  interpretation  of  the  scene,  where 
the  scene  is  assumed  to  be  made  up  of  some  fixed 
number  of  moving  planar  surface  patches.  Their 
Bayesian  approach  has  two  prerequisites: 
specification  of  prior  expectations  for  the  optic  flow 
field,  which  Murray  and  Buxton  model  as  spatial 
and  temporal  MRFs,  and  a  way  of  measuring  how 
well  the  segmentation  predicts  the  measured  field. 
The  MRFs  incorporate  the  physical  constraints  that 
objects  and  their  images  are  probably  spatially 
continuous,  and  that  their  images  are  likely  to  move 
quite  smoothly  across  the  image  plane.  To  compute 
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the  flow  predicted  by  the  segmentation,  a  method 
for  reconstructing  the  motion  and  orientation  of 
planar  surface  facets  is  used.  Then  the  search  for 
the  globally  optimal  segmentation  is  performed  using 
simulated  annealing.  Their  most  important  result 
was  the  formulation  of  scene  segmentation  from 
visual  motion  as  an  optimization  problem  that  is 
weakly  constrained,  or  guided,  by  prior  physical 
expectations. 

Two  serious  problems  were  noted  in  Murray’s 
and  Buxton’s  experiment.  First,  the  current 
implementation  of  the  segmentation  process  requires 
the  specification  of  the  number  of  objects  likely  to 
be  found.  The  more  serious  problem  is  that  the 
algorithm  is  computationally  inefficient. 

5.12.3  Hierarchical  Edge  Detection 

McLean  and  Jemigan35  discuss  the  design  of 
efficient  edge-detection  operators.  The  need  for  such 
operators  is  reviewed  and  a  set  of  design  and 
performance  criteria  is  developed.  These  criteria 
are  then  used  to  evaluate  existing  edge-detection 
techniques,  as  well  as  to  suggest  some  new 
approaches  to  this  important  problem. 

The  following  requirements  for  an  edge  detector 
are  considered: 

•  capable  of  working  in  a  purely  local  context. 

•  efficient  when  applied  in  any  order  (it  cannot 
derive  efficiency  by  exploiting  redundancies  when 
applied  in  a  particular  fashion), 

•  insensitive  to  the  orientation  or  to  the  magnitude 
of  the  edge, 

•  work  well  in  the  presence  of  noise, 

•  relatively  insensitive  to  threshold  specifications. 

McLean  and  Jemigan  describe  a  method  of  hier¬ 
archical  edge  detection  (HED),  which  is  based  on 
nonlinear  edge  detectors.  HED  is  a  two-step  process 
consisting  of  a  coarsc-oricnted  gradient  measure¬ 
ment  followed  by  the  application  of  a  particular 
orientation  of  one  of  the  efficient  1-D  edge  detectors. 
The  gradient  preprocessing  step  serves  as  an  initial 
filter,  so  that  only  those  pixels  which  exceed  the 
gradient  threshold  become  candidates  for  the  more 
expensive,  oriented,  edge-detection  scheme.  In  this 
manner,  the  HED  scheme  becomes  very  efficient: 
the  time  required  is  related  to  the  amount  of  edge 
and  noise  activity  that  exists  within  the  image. 

The  burden  of  edge  detection,  then,  falls  heavily 
on  the  gradient  preprocessing  scheme  used,  since  it 
must  not  remove  true  edges  and  must  block  as  many 
nonedged  pixels  as  possible.  Also,  it  must  be 
directionally  sensitive,  so  that  the  correctly  oriented 


edge  detector  will  be  applied.  Simple  logic  tests 
were  used  to  determine  edge  orientation  after  the 
two  gradients  were  thresholded. 

McLean’s  and  Jemigan ‘s  HED-based  method  is 
shown  to  perform  well.  An  encouraging  result  of 
their  work  is  that  effective  edge  detection  can  be 
performed  while  maintaining  a  highly  structured, 
highly  localized  approach  to  this  aspect  of  image 
processing.  The  HED  scheme  is  well  suited  for 
inclusion  in  systems  that  encompass  multiple  levels 
of  processing.  A  future  research  task,  and  follow 
on  to  this  effort,  could  investigate  the  possibility  of 
adapting  the  edge  extraction  process  to  follow 
contours,  thus  eliminating  wasted  processing. 

5.13  Splitting  and  Merging 
5.13.1  Attributed  String  Matching 
and  Merging 

Tsai  and  Yu36  give  a  new  structural  approach  to 
shape  recognition  that  utilizes  string  matching  with 
merging.  They  first  present  disadvantages  of 
conventional  symbolic  suing  matching,  which  uses 
changes,  deletions,  and  insertions.  Attributed  strings 
arc  suggested  for  matching,  where  each  attributed 
string  is  an  ordered  sequence  of  shape  boundary 
primitives  that  represents  a  basic  boundary  structural 
unit  (a  line  segment)  with  two  types  of  numerical 
attributes  (a  length  and  a  direction).  A  new  type  of 
primitive  edit  operation,  called  a  merge,  is  then 
introduced.  The  merge  can  be  used  to  combine  and 
then  match  any  number  of  consecutive  boundary 
primitives  in  one  shape  with  those  in  another.  The 
resultant  attributed  siring  matching  with  merging 
approach  is  then  shown  to  be  useful  for  recognizing 
distorted  shapes.  Experimental  results  are  also  given 
to  prove  the  feasibility  of  this  approach. 

There  were  occasional  erroneous  classifications 
for  the  following  reasons: 

•  The  number  of  boundary  primitives  for  each 
shape  was  limited  to  10  to  increase  processing  speed. 

•The  image  resolution  was  low  (128 x  128)  for 
the  relatively  high  shape  complexity. 

•  The  three  images  selected  for  this  test  (pliers) 
were  similar  in  shape. 

•  The  selection  of  thresholds  and  constants  used 
in  the  algorithms  was  not  optimal. 

•  Poor  computational  accuracy  resulted  from  the 
exclusive  use  of  integer  calculations  (to  avoid 
the  relatively  low-speed  real-number  computations 
on  the  available  microcomputer). 

The  authors  also  test  attributed  string  matching 
with  merging  on  images  with  no  occlusion.  They 
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suggest  that  some  extensions  to  this  approach  arc 
possible,  e.g..  the  improvement  of  operation  cost 
functions,  the  inclusion  of  primitive  splitting  into 
the  matching  algorithm,  more  intelligent  solutions 
to  the  shape  orientation  problem,  applications  of 
the  proposed  approach  to  recognizing  occluded 
shapes,  etc. 

5.13.2  Segmenting  Range  Imagery  into 
Planar  Regions 

A  fast  technique  for  segmenting  range  imagery 
into  planar  regions  is  discussed  by  Taylor  et  al.37 
Range  images  provide  direct  measurements  of  the 
3-D  surface  coordinates  of  a  scene.  The  technique 
rapidly  divides  range  imagery  surfaces  into  regions 
that  satisfy  a  common  homogeneity  criterion.  Key 
features  that  enhance  the  algorithm’s  speed  include 
the  development  of  appropriate  region  descriptors 
and  the  use  of  fast  region  comparison  techniques 
for  segmentation  decisions.  Their  split-and-merge 
algorithm  bases  its  homogeneity  criterion  on  a  three- 
parameter,  planar  surface  description,  in  which 
the  three  parameters  are  two  angles  (to  describe  the 
orientation  of  the  normal  to  the  local  best  fit  plane) 
and  the  original  range  value.  Speed  is  achieved 
because  both  the  region  splitting  and  the  rejection 
of  merge  possibilities  can  often  be  based  on  simple 
comparisons  of  these  two  orientation  parameters. 

A  fast,  but  more  complex,  region-to-region  range 
continuity  test  is  also  developed  for  use  when  the 
orientation  homogeneity  tests  are  inconclusive. 
The  importance  of  merge  ordering  is  discussed:  one 
particularly  effective  ordering  technique,  which  is 
based  on  dynamic  criteria  relaxation,  is  demonstrated 
within  their  paper.  Sample  segmentations  of  simple 
and  complex  range  data  images  arc  also  shown,  and 
the  effects  of  noise  and  preprocessing  arc  examined. 

The  authors  state  that  splitting  and  merging  are 
done  conservatively  in  their  algorithm  and  produce 
oversegmented  images.  Extra  region  boundaries  arc 
detected,  but  no  major  boundaries  arc  broken;  thus, 
the  oversegmentation  is  due  to  fragmentation  of  true 
regions.  Additional  merge  phases  with  relaxed 
homogeneity  criteria  are  used  to  reduce  this 
fragmentation.  No  general  procedure  was  offered 
for  selecting  these  values,  but  it  was  noted  that  the 
values  are  not  very  sensitive  for  planar  data,  as 
long  as  the  general  relaxation  trends  described  by 
the  authors  arc  followed. 

Overall,  the  authors’  algorithm  can  rapidly 
segment  an  object  in  a  range  image  into  a  surface 


composed  of  planar  regions.  The  results  from  this 
algorithm  show  that  it  could  be  an  initial  step  to  a 
more  complex  merging  technique. 

5.13.3  Segmenting  Aerial  Photographs 

A  method  of  segmenting  aerial  photographs, 
described  by  Laprade,38  approximates  the  image 
intensity  surface  with  planar  facets.  The 
approximation  is  accomplished  by  using  a  split- 
and-merge  approach  that  is  somewhat  different  than 
those  previously  mentioned.  A  combination  of  an 
F-test  and  a  mean  predicate  is  used  to  test  the 
uniformity  of  regions.  When  two  regions  arc  merged 
into  a  new  region,  nine  variables  are  needed  to 
compute  the  least-squares  plane  (the  components 
of  the  3  x  3  matrix  in  the  normal  equations)  for  the 
new  region.  These  variables  can  be  computed  by 
adding  the  corresponding  variables  for  the  individual 
regions.  This  process  leads  to  an  efficient  algorithm. 

Features  that  differ  from  the  standard  split- 
and-merge  algorithm  are  described  in  Laprade’s 
paper.  One  such  feature  is  the  use  of  multiple 
predicates,  specifically  the  mean  predicate  and 
F-test.  at  certain  stages  of  the  algorithm.  Regions 
are  allowed  to  merge  with  other  regions  during  the 
region-growing  process,  as  opposed  to  the  usual 
practice  of  allowing  a  growing  region  to  absorb 
only  quads  that  have  not  been  assigned  to  another 
region.  Multiple  predicates  were  used  because  the 
F-test  is  not  sensitive  to  the  magnitude  of  differences 
between  regions,  only  to  their  uniformity. 

In  other  words,  two  regions  may  differ  only 
slightly  in  their  means  or  slopes,  but  if  the  residuals 
of  these  two  regions  from  their  facet  fits  are  much 
smaller  than  these  differences,  then  the  F-test  will 
classify  these  regions  as  being  distinct.  In  addition 
to  the  F-test,  two  regions  were  compared  by  look¬ 
ing  at  the  maximum  difference  between  their  facet 
representations  at  points  where  the  regions  were 
adjacent. 

In  general,  the  results  of  this  technique  offer  an 
improvement  over  the  flat  facet  results.  However, 
there  is  one  problem  associated  with  this  technique: 
the  splitting  procedure  finds  very  few  uniform  areas 
from  which  to  grow  regions.  Such  oversegmentation 
is  necessary  to  ensure  that  quads,  which  include 
edges  of  regions,  are  split.  If  one  region  contributes 
only  a  small  part  of  a  quad’s  area,  then  the  threshold 
that  controls  splitting  must  be  very  tight  to  ensure 
that  splitting  occurs.  If  the  output  of  an  edge  detector 
is  used  to  ensure  that  quads  containing  edge  pixels 
are  split,  this  key  problem  may  be  greatly  reduced. 
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5.13.4  Other  Split-and-Merge  Approaches 

Taylor  et  al.39  developed  a  technique  to  rapidly 
divide  surfaces  in  range  imagery  into  regions  that 
satisfy  a  common  homogeneity  criterion.  The  result 
is  a  segmentation  of  the  range  information  into 
approximately  planar  surface  regions.  Key  features 
that  enhance  the  algorithm’s  speed  include  the 
development  of  appropriate  region  descriptors  and 
the  use  of  fast  region  comparison  techniques  for 
segmentation  decisions.  Their  algorithm  takes  a  split- 
and-merge  approach,  where  the  homogeneity  criteria 
is  based  on  three  planar  surface  description 
parameters:  two  angles  (to  describe  the  orientation 
of  the  normal  to  the  local  best-fit  plane)  and  the 
original  range  value.  Speed  is  achieved  because  both 
region-splitting  and  the  rejection  of  merge 
possibilities  can  often  be  based  on  simple 
comparisons  of  only  the  two  orientation  parameters. 
Another  fast,  but  more  complex,  region-to-region 
range  continuity  test  is  developed  for  inconclusive 
orientation  tests. 

In  Taylor  et  al.,  the  concept  of  the  split- 
and-merge  technique  is  extended  from  gray-level 
imagery  to  range  imagery.  The  algorithm  segments 
range  images  into  a  set  of  planar  surface  regions 
by  using  efficient  planar  region  comparison  and 
multiple  merge  phases.  Splitting  and  merging  arc 
done  conservatively,  yielding  oversegmented  images. 
Extra  region  boundaries  are  detected,  but  no  major 
boundaries  are  broken;  thus,  the  oversegmentation 
is  due  to  the  fragmentation  of  true  regions.  Additional 
merge  phases  with  relaxed  homogeneity  criteria  are 
then  used  to  reduce  this  fragmentation.  No  general 
procedure  for  selecting  these  values  is  offered; 
however,  it  is  suggested  that  the  values  are  not  very 
sensitive  for  data  that  are  actually  planar,  as  long 
as  the  general,  indicated,  relaxation  trends  are 
followed  as  instructed  by  the  authors. 

The  algorithm  is  effective  in  the  presence  of  a 
large  amount  of  noise;  for  this  situation,  image 
filtering  becomes  important.  Two  central  filtering 
techniques  are  discussed;  mean  filtering  is  used  first, 
followed  by  median  filtering.  Also,  varying  window 
sizes  are  used  when  the  noisy  image  data  are  filtered. 
Other  filtering  possibilities  are  suggested,  including 
filtering  in  parameter  space  and  a  hierarchical, 
multiple-window,  planar  estimation  scheme. 

Merging  options  available  for  the  split-and-merge 
algorithm  have  varying  effects  on  the  results: 

*  Forcing  single-pixel  regions  to  merge  with  their 
best-match  neighbors  improves  the  quality  of  the 


segmentation,  particularly  at  the  boundaries  of  true 
regions. 

•  Allowing  the  algorithm  to  merge  regions  as  they 
are  encountered,  rather  than  in  descending  order  of 
size,  work  best. 

•  Considering  the  neighbors  of  a  region  by  angle 
value  closeness  for  merging  slightly  degrades  the 
algorithm’s  performance. 

•  Performing  multiple  mergings  with  successive 
relaxed  homogeneity  criteria  dramatically  improves 
the  segmentation  results. 

The  results  from  Taylor’s  algorithm  suggest  that 
it  is  a  potential  "front-end"  stage  for  a  more  complex 
merging  technique. 

5.14  Object  Recognition  in  3-D  Space 

Hoffman  and  Jain40  describe  the  recognition  of 
objects  in  3-D  space  for  use  in  computer  vision 
systems.  Range  images,  which  directly  measure 
3-D  surface  coordinates  of  a  scene,  are  well  suited 
for  this  task.  The  authors  present  a  procedure  to 
detect  connected  planar,  convex,  and  concave 
surfaces  of  3-D  objects.  Their  algorithm  is 
implemented  in  three  stages.  The  first  stage  segments 
the  range  image  into  “surface  patches”  by  a  square- 
error,  criterion-clustering  algorithm  that  uses  surface 
points  and  associated  surface  normals.  The  second 
stage  classifies  these  patches  as  planar,  convex,  or 
concave;  the  classification  is  based  on  a  nonpara- 
metric  statistical  test  for  trend,  curvature  values, 
and  eigenvalue  analysis.  In  the  final  stage,  boundaries 
between  adjacent  surface  patches  arc  classified  as 
crease  or  noncreasc  edges,  and  this  information  is 
then  used  to  merge  compatible  patches  and  produce 
reasonable  faces  of  the  object(s). 

The  authors  demonstrate  that  a  square-error, 
criterion-clustering  algorithm  performs  well  for 
segmenting  a  variety  of  range  images  into  patches. 
Information  is  provided  about  the  geometric  structure 
of  objects  by  producing  surface  patches  that  do  not 
cross  over  natural  jump  or  crease  edges.  They  chose 
clustering  to  implement  their  segmentation  phase 
of  the  algorithm  because  clustering  was  found  to 
perform  better  than  any  edge-based  technique. 
Among  all  possible  clustering  algorithms,  the  authors 
chose  one,  appropriately  named  CLUSTER,  which 
was  developed  by  Dubes  and  Jain.41 

Hoffman’s  and  Jain’s  procedure  is  useful  for  object 
representation  and  recognition  based  on  surface 
primitives.  The  authors  used  the  method  specifically 
to  find  natural  object  faces  in  a  range  image.  The 
procedure  segments  and  classifies  range  images  and 
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merges  the  resultant  surface  patches.  The  clustering 
algorithm  chosen  was  shown  to  be  effective  over  a 
variety  of  range  images  for  partitioning  an  image 
into  surface  patches  that  do  not  cross  over  crease 
or  jump  edges.  A  classification  of  these  surface 
patches  as  planar,  convex,  or  concave  is  strongly 
based  on  a  nonparamctric  statistical  test.  Although 
the  test  is  simple  in  concept,  it  proves  to  be  a 
powerful  tool  for  this  application.  A  disadvantage 
of  the  nonparamctric  test,  however,  is  the  need  for 
moderate  sample  sizes,  which,  in  this  technique, 
translates  into  moderate  patch  sizes.  The  decision 
tree  for  patch  classification  is  designed  to  fall  back 
on  curvatures  and  eigenvalues  if  a  patch  is  too  small 
to  make  a  meaningful  decision  by  the  trend  test.  A 
crease  edge-detection  technique  is  used  to  guide 
the  reconstruction  of  the  natural  object  faces,  which 
were  oversegmented  by  the  cluster  technique.  It  was 
noted  that  the  eigenvalue  threshold  may  have  to  be 
increased  for  those  images  in  which  only  a  small 
portion  of  the  available  depth  values  are  used. 

5.15  Shadow  Boundary  Segmentation 

Hambrick  et  al.42  have  documented  a  new 
technique  to  interpret  arbitrarily  shaped  surfaces 
by  segmenting  and  labeling  the  shadow  boundary. 
The  technique  is  called  the  Entry-Exit  Method  of 
Shadow  Boundary  Segmentation,  and  its  distin¬ 
guishing  attributes  can  be  summarized  as  follows: 

•  Extracts  shape-related  information  from  the 
shadow  cast  by  arbitrarily  shaped  objects  on  known 
surfaces. 

«  Works  independently  of  a  priori  knowledge  of  the 
scene,  requiring  only  the  shadow  boundary  and 
the  illumination  vector. 

•  Provides  a  general  structure  for  shadow  bound¬ 
aries  by  identify-identifying  the  basic  segments  and 
establishing  the  relationships  among  them. 

°  Defines  the  minimum  set  of  segment  types 
required  to  describe  and  interpret  a  shadow. 

•  Delineates  the  possible  identities  of  isolated 
boundary  segments. 

•  Automatically  recognizes  ambiguities  caused  by 
occlusions,  coincidences,  and  intermediate  errors. 

The  shadow-handling  technique  is  based  on  the 
key  principle  that  each  point  on  a  shadow  boundary 
is  either  an  entry  or  an  exit  point.  That  is,  a  light 
ray  projected  across  the  boundary  would  either  enter 
or  exit  the  shadow  at  that  point.  Segments  consisting 
of  entry  points  are  called  entry  segments;  they  face 


toward  the  light  source.  Segments  of  exit  points 
arc  called  exit  segments;  they  face  away  from  the 
light  source.  A  pair  of  entry  and  exit  segments  whose 
end  points  are  aligned  along  light  rays  compare  a 
shadow-making  line  and  its  corresponding  shadow 
line.  An  exit  segment  connected  to  the  shadow¬ 
making  line  is  an  occluding  line.  An  entry  segment 
connected  to  the  shadow  line  is  the  shadow  of  a 
hidden  shadow-making  line. 

A  thresholding  technique  was  developed  by 
Perez43  for  segmenting  digital  images  with  bimodal 
reflectance  distributions  under  nonuniform 
illumination.  The  algorithm  works  in  a  raster  format, 
thus  making  it  an  attractive  segmentation  tool  in 
situations  requiring  fast  data  throughput.  The 
theoretical  base  for  this  algorithm  is  a  recursive 
Taylor  expansion  of  a  continuously  varying  threshold 
tracking  function. 

5.16  Segmentation  of  Handwritten 
Numerical  Strings 

Shridhar  and  Badreldin44  give  a  context-directed 
segmentation  algorithm  for  handwritten  numerical 
strings,  in  which  connected  numerical  strings  are 
split  into  their  key  components.  The  algorithm  is 
hierarchical  in  that  it  tests  various  hypotheses  ranging 
from  the  case  in  which  the  numerals  are  completely 
isolated  to  that  in  which  the  numerals  may  be 
connected,  touching,  or  existing  in  overlapping  fields. 

Test  results  for  this  technique  revealed  that  on 
200  numerals,  an  accuracy  of  92%  was  obtained. 
The  recognition  errors  were  mainly  due  to  the 
pseudofeaturcs  generated  by  the  connecting  tail  that 
was  still  present  after  segmentation.  The  authors 
felt  that  these  errors  could  be  reduced  by  modifying 
the  recognition  algorithm  to  account  for  the 
pseudofeatures.  Errors  could  also  be  reduced  by 
providing  information  to  the  recognition  stage 
segmenter  that  would  indicate  where  the  numerals 
were  disconnected,  thus  allowing  the  algorithm  to 
predict  where  the  features  of  each  numeral  might 
be  computed. 

Many  assumptions  were  made  in  this  test.  The 
algorithm  assumes  that  the  numeral  strings  were 
written  in  a  “normal”  way,  i.e.,  each  numeral  in  the 
string  had  roughly  the  same  height  It  was  also 
assumed  that  the  numerals  were  written  on  a  specified 
line  with  orientations  limited  to  20  degrees  from 
the  vertical.  Finally,  the  number  of  numerals  in  the 
string  must  have  been  specified  prior  to  processing. 
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6.0  Template-Matching  Segmentation 

One  direct  method  of  segmenting  an  image  is  to 
match  it  against  templates  from  a  given  list.  The 
detected  objects  can  then  be  segmented  out,  and 
the  remaining  image  can  be  analyzed  by  other 
techniques.  This  method  can  be  used  to  segment 
busy  images,  such  as  journal  pages  that  contain 
text  and  graphics.  The  text  can  be  segmented  by 
template-matching  techniques  and  graphics  can  be 
analyzed  by  boundary-following  algorithms.1 

In  1980,  Tsuji  et  al.45  documented  a  dynamic 
scene  analyzer  that  separated  moving  objects  (such 
as  animated  films)  from  the  background  and  analyzed 
their  motion  patterns  in  dynamic  line  images.  Since 
the  objects  move  and  rotate  in  a  3-D  world,  occlusion 
often  occurs,  and  the  shapes,  sizes,  and  structures 
of  the  moving  object  images  change  from  frame  to 
frame.  The  background  and  stationary  objects  may 
also  move  in  the  images  due  to  movements  of  the 
camera  while  tracking  interesting  objects.  The  task 
of  the  analyzer  is  to  segment  the  scene  into 
meaningful  constituents  and  to  obtain  a  structural 
description  of  each  object  that  contains  properties, 
spatial  relations,  and  motion  patterns. 

This  flexible  template-matching  method  finds 
correspondence  between  regions  and  their  respective 
segments  in  a  sequence  of  input  frames.  Also,  the 
analyzer  tracks  the  moving  regions  and  segments 
within  the  dynamic  images.  A  similarity  test  of 
segment  movements  then  detects  the  background 
movement  and  classifies  the  regions  into  stationary 
and  nonstationary.  Each  region  in  the  latter  group 
is  further  labeled  as  partly  occluded,  false,  or  moving 
by  examining  both  the  motion  patterns  of  its 
segments  and  the  temporal  change  of  its  structure. 
Finally,  the  analyzer  merges  the  segments  of  each 
moving  object  into  groups  with  similar  motion 
patterns  to  obtain  a  meaningful  partition  that 
corresponds  to  its  components,  such  as  hands  or 
legs. 

The  authors  conclude  by  stating  that  the  system 
is  primitive  because  a  simple  scene  model  is  used. 
They  list  important  problems  for  future  related 
research,  such  as  analysis  of  the  rotations  in  the 
3-D  world,  analysis  of  the  structural  change  of 
the  line  image,  semantic  interpretation  of  moving 
and  stationary  regions,  and  understanding  the 
meaning  of  the  movements. 

7.0  Texture  Segmentation 

Texture  segmentation  becomes  important  when 
objects  in  a  scene  have  a  textured  background.  Since 


texture  often  contains  a  high  density  of  edges, 
boundary-based  techniques  may  become  ineffective 
unless  the  texture  is  filtered  out.  Clustering  and 
region-based  approaches  applied  to  textured  features 
can  be  used  to  segment  textured  regions.  In  general, 
texture  segmentation  and  classification  is  a 
complicated  problem.  Use  of  a  priori  knowledge 
about  the  existence  and  kinds  of  textures  that  may 
be  present  in  a  scene  can  be  beneficial  when  applied 
to  practical  problems.1 

Raafat  and  Wong46  present  a  new  method  for 
image  segmentation  and  region  classification  based 
on  the  texture  content  of  different  regions  in  an 
image.  This  technique  uses  a  new  measure  of  texture 
information  to  initiate  texturally  homogeneous  core 
regions.  Next,  the  information  measure,  together 
with  a  new  texture  distance  measure  (known  as  the 
event  set  distance)  is  used  to  direct  the  growth  of 
various  homogeneous  regions.  Since  the  texture 
information  measure  reflects  both  the  local  and  global 
properties  of  an  image,  the  segmentation  process  is 
highly  adaptable  to  various  images.  The  event  set 
distance  is  defined  over  a  set  of  gray-level  and 
gradient  vector  histograms  derived  from  the  texture 
content  within  image  blocks.  The  method  is  data- 
directed,  computationally  efficient,  and  operationally 
flexible  to  accommodate  various  textural  properties 
and  distances.  Their  algorithm  for  segmenting  and 
classifying  textured  images  is  based  on  the  low- 
level  vision  approach,  where  no  a  priori  knowledge 
is  available  about  the  number  and  types  of  textures 
present  in  the  image.  This  technique  is  a  region- 
growing  method  directed  by  the  texture  information 
inherent  in  various  regions  of  the  image  (i.e.,  it  is 
based  solely  upon  the  relative  visual  characteristics 
of  the  image).  The  authors  note  that  from  experiments 
with  this  technique,  the  algorithm  proved  effective 
and  efficient. 

7.1  Shift-Match  Method 

An  approach  to  the  segmentation  of  dynamic 
scenes  that  contain  textured  objects  moving  against 
a  textured  background  is  presented  by 
Jayaramamurthy  and  Jain.4'  Their  multistage 
approach  first  uses  a  differencing  operation  to  obtain 
active  regions  in  the  frames  that  contain  moving 
objects.  In  the  next  stage,  an  HT  technique  is  used 
to  determine  the  motion  parameters  associated  with 
each  active  region.  Finally,  the  intensity  changes 
and  motion  parameters  are  combined  to  obtain  masks 
of  the  moving  objects. 
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The  approach  to  recovering  masks  of  moving 
objects  is  referred  to  as  the  “shift-match  method,” 
which  does  not  require  prior  segmentation  of  frames. 
This  technique  depends  solely  on  motion  to  obtain 
segmentation.  It  seems  to  have  performed  well  in  a 
textured  environment,  even  in  the  presence  of 
occlusion. 

7.2  Segmentation  Using  Temporal  Textures 

Samy  et  al.4S  give  an  image  sequence  segmenta¬ 
tion  algorithm  for  the  analysis  of  dynamic  textured 
scenes.  A  measure  of  time-varying  textures,  based 
on  classical  spatial  texture  measures,  and  temporal 
filters  to  enhance  moving  regions  are  also  given. 
Their  segmentation  algorithm  is  a  straightforward 
extension  of  real-time  target-tracking  algorithms 
based  on  adaptive  statistical  clustering. 

Hyde  et  al.49  give  a  means  of  producing  a 
multiresolution,  multipredicate  representation  of 
image  data.  They  demonstrated  that  object 
segmentation  and  classification  can  be  represented 
in  the  context  of  image  interpretation.  In  the  same 
computational  process,  they  also  showed  that 
multiresolution  data  can  be  provided  at  successively 
decreasing  resolutions  to  provide  region  segmenta¬ 
tions  or,  equivalently,  edge  segmentations  as 
required.  This  method  has  also  been  used  for  color 
segmentation  and  extended  to  the  temporal  domain 
to  provide  optical  flow  information. 

7.3  Markov  Random  Fields  in 
Texture  Segmentation 

Cohen  and  Cooper33  suggest  simple,  parallel, 
hierarchical  and  relaxation  algorithms  to  segment 
noncausal  MRFs.  Two  conceptually  new  algorithms 
are  presented  for  segmenting  textured  images  into 
individual  regions.  The  data  from  the  regions  are  then 
modeled  as  an  MRF.  The  algorithms  are  designed 
to  operate  in  realtime  when  implemented  on  parallel 
computer  architectures. 

A  doubly  stochastic  representation  is  used  in 
the  image  modeling  process.  Cohen  and  Cooper  use 
a  Gaussian  MRF  to  model  textures  in  visible  light 
and  infrared  images,  and  an  autobinary  (or 
autotemaTy)  MRF  to  model  a  priori  information 
about  the  local  geometry  of  the  textured  image 
regions.  For  image  segmentation,  the  true  texture 
class  regions  are  treated  a  beforehand  either  as 
completely  unknown  or  as  a  realization  of  a  binary 
(or  ternary)  MRF.  In  the  former  case,  image  seg¬ 
mentation  is  realized  as  true  maximum,  likelihood 
segmentation.  In  the  latter  case,  the  segmentation 


is  realized  as  true  maximum,  a  posteriori  likelihood 
segmentation. 

7.4  Texture  Segmentation  Using 
Fractal  Geometry 

Keller  and  Chen50  give  a  new  method  for 
estimating  the  fractal  dimension  from  image  surfaces 
and  show  that  the  method  describes  and  segments 
generated  fractal  sets  well.  Since  the  fractal 
dimension  alone  is  not  sufficient  to  characterize 
natural  textures,  a  new  class  of  texture  measures 
based  on  the  concept  of  lacunarity  is  defined  and 
used  in  conjunction  with  the  fractal  dimension  to 
describe  and  segment  natural  texture  images.  They 
also  developed  new  methods  for  computing  the 
fractal  dimension  and  lacunarity.  Finally,  they  state 
that  equivalent  performance  could  be  obtained  by 
using  a  supervised  segmentation  algorithm  and 
perhaps  including  other  texture  features. 

8.0  Discussion 

Several  digital  image  segmentation  experiments 
were  recently  completed  at  the  Naval  Oceangraphic 
and  Atmospheric  Research  Laboratory  using 
acoustical  imagery.  These  experiments  confirmed 
the  hypothesis  that  combinations  of  digital  image 
segmentation  techniques  must  be  used  to  adequately 
differentiate  common  gcoacoustic  regions.51  The 
findings  are  briefly  summarized: 

•  Texture-based  segmentation  can  be  applied  to 
acoustical  imagery  to  yield  additional  segmenta¬ 
tions  as  compared  to  simply  using  the  intensity  values 
alone. 

•  Texture  measures  can  be  applied  to  an  acoustical 
image  to  yield  texture  bands  that  can  be  combined 
to  form  a  multiband  image,  which  then  can  be  used 
as  input  to  these  conventional  digital  image 
segmentation  techniques. 

•  Clustering  techniques  are  effective  segmenters 
for  acoustical  imagery  and  yield  complementary 
segmentations  when  combined  with  the  multitcxture 
band  image.  This  type  of  hybrid  segmentation 
was  effective  because  it  combined  the  texture  of 
spatial  statistical  information  gained  from  texture- 
based  segmentation,  as  well  as  using  a  region-based 
clustering  segmentation  algorithm. 

9.0  Summary  and  Conclusions 

Six  principal  categories  of  digital  image  segmen¬ 
tation  have  been  surveyed,  with  emphasis  on  the 
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techniques  appearing  in  the  technical  literature  that 
relate  to  Navy  seafloor  segmcntation/classification. 
Many  of  the  papers  provided  new  algorithms  for 
addressing  particular  segmentation  tasks. 

Specific  conclusions  from  this  study  are  as 
follows: 

•  Hybrid  techniques  (a  combination  of  two  or  more 
of  the  discussed  segmentation  techniques)  should 
prove  effective  on  some  seafloor  segmentation/ 
classification  problems — for  example,  the  com¬ 
bination  of  region-based  segmentation  advantages 
(being  adaptive  dependent  on  the  region)  with 
the  benefits  of  the  texture-based  techniques, 
which  normally  require  a  priori  knowledge  for  full 
utilization. 

•  Current  NOARL  research  programs  can  build 
on  the  texture -based  approaches  outlined,  as  well 
as  the  material  on  region-based  segmentation  and 
Hough  Transform  utilization  for  improved  edge 
detection. 

•  Real-time  implementation  of  the  region-based 
techniques  will  require  special  emphasis  on  the 
merging  rules,  since  this  task  is  computationally 
intensive. 

•  Amplitude  thresholding  alone  will  not  provide 
the  discrimination  necessary  for  gcoacoustic  province 
selection  given  sidcscan  sonar  imagery,  since  more 
information  than  the  histogram  alone  must  be 
examined. 

•  The  most  useful  boundary-based  segmentation 
technique  for  current  seafloor  acoustic  imagery 
exploitation  is  the  Hough  Transform  (the  formulation 
which  handles  vertical  lines  also). 

•  Three-dimensional  object  recognition  techniques 
described  by  Hoffman  and  Jain40  should  prove  useful 
for  3-D  gcoacoustic  province  selection  for  sidcscan 
sonar  imagery. 

•  Template-matching  segmentation  techniques 
could  be  useful  for  very  simple  and  well-defined 
regions  but  not  for  the  rapidly  changing  ocean  bottom 
and  water  column. 

•  Fractal  geometry  (e.g.,  fractal  dimension)  can 
be  used  to  augment  other  segmenting  descriptors 
for  seafloor  acoustic  imagery.  As  noted  in  this  paper, 
fractal  dimension  will  not  sufficiently  characterize 
natural  textures. 
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